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Abstract 

Background: The gamma-gliadins are considered to be the oldest of the gliadin family of storage proteins in 
Aegilops/Triticum. However, the expansion of this multigene family has not been studied in an evolutionary 
perspective. 

Results: We have cloned 59 gamma-gliadin genes from Aegilops and Triticum species {Aegilops caudata L, Aegilops 
comoso Sm. in Sibth. & Sm., Aegilops mutico Boiss., Aegilops speltoides Tausch, Aegilops touschii Coss., Aegilops 
umbellulata Zhuk., Aegilops uniahstata Vis., and Triticum monococcum L.) representing eight different genomes: A m , 
B/S, C, D, M, N, T and U. Overall, 15% of the sequences contained internal stop codons resulting in pseudogenes, 
but this percentage was variable among genomes, up to over 50% in Ae. umbellulata. The most common length of 
the deduced protein, including the signal peptide, was 302 amino acids, but the length varied from 215 to 362 
amino acids, both obtained from Ae. speltoides. Most genes encoded proteins with eight cysteines. However, all 
Aegilops species had genes that encoded a gamma-gliadin protein of 302 amino acids with an additional cysteine. 
These conserved nine-cysteine gamma-gliadins may perform a specific function, possibly as chain terminators in 
gluten network formation in protein bodies during endosperm development. A phylogenetic analysis of 
gamma-gliadins derived from Aegilops and Triticum species and the related genera Lophopyrum, Crithopsis, and 
Dasypyrum showed six groups of genes. Most Aegilops species contained gamma-gliadin genes from several of 
these groups, which also included sequences from the genera Lophopyrum, Crithopsis, and Dasypyrum. Hordein and 
secalin sequences formed separate groups. 

Conclusions: We present a model for the evolution of the gamma-gliadins from which we deduce that the most 
recent common ancestor (MRCA) of Aegilops/Triticum-Dasypyrum-Lophopyrum-Crithopsis already had four groups of 
gamma-gliadin sequences, presumably the result of two rounds of duplication of the locus. 
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Background 

Prolamin storage proteins are produced in large 
amounts in the developing endosperm of Triticeae spe- 
cies. These storage proteins are a complex mixture of 
alpha/beta-, gamma- and omega-gliadins and high- and 
low molecular weight glutenins, collectively called 'glu- 
ten' in wheat. They are encoded by medium to large 
multigene families. For example, the alpha-gliadins are 
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encoded by a complex gene family with estimates for 
copy number that range from 25-35 copies [1] to 100 
[2] or even 150 copies [3] per haploid genome, most of 
which (72-95%) are pseudogenes [3,4]. Sequence similar- 
ity of alpha-gliadins from bread wheat to alpha-gliadins 
from diploid Aegilops/Triticum species, which are close 
relatives of the diploid ancestors of bread wheat, demon- 
strated that there are three distinct groups of alpha-glia- 
dins, one for each of the three homoeologous loci in 
hexaploid bread wheat [4]. This is consistent with the 
notion that the expansion of this gene family took place 
after the ancestors of the different genomes of Aegilops/ 
Triticum became separated. 



© 2012 Goryunova et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the 
Creative Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, 
distribution, and reproduction in any medium, provided the original work is properly cited. 



Goryunova et al. BMC Evolutionary Biology 201 2, 12:21 5 
http://www.biomedcentral.com/1471 -21 48/1 2/215 



Page 2 of 12 



The gamma-gliadins are considered to be the most an- 
cient of the gliadins and LMW-glutenins [5]. In bread 
wheat they are encoded by the homoeologous Gli-1 loci 
(Gli-Al, Gli-Bl and Gli-Dl), located on the short arms 
of the homoeologous chromosomes 1 [6,7]. In the var- 
iety Chinese Spring the number of gamma-gliadins was 
preliminary estimated at 15-40 [8,9] and, in contrast to 
the situation in alpha-gliadins, only a small fraction 
(-14%) of the gamma-gliadin genes in hexaploid bread 
wheat consisted of pseudogenes [10]. Nevertheless, se- 
quence analysis showed that the gamma-gliadins form a 
highly diverse gene family [9,10]. 

The large majority of the gamma-gliadin sequences 
available in Genbank are from tetraploid Triticum 
durum (A and B genomes) and hexaploid Triticum aesti- 
vum (A, B and D genomes), diploid Triticum monococ- 
cum (A genome) and diploid Aegilops species with S and 
D genomes (the B genome is closely related to the S 
genome of Aegilops speltoides, [11,12]). Using such a col- 
lection of gamma-gliadin sequences Qi et al. [10] classi- 
fied gamma-gliadins into 17 subgroups, most of which 
had 8 cysteine residues per protein, but 7, 8, and 10 resi- 
dues also occurred. The cysteine residues form sulphur 
bridges, and proteins with unequal numbers of cysteins 
can covalently bind to a network of HMW glutenins and 
other gluten proteins [13]. Of these 17 subgroups those 
with A genome gamma-gliadins appeared to be distinct 
from the subgroups that contain B (S) and/or D genome 
genes. As only these three diploid progenitor genomes 
were included, the study did not provide insight in the 
evolutionary history of the gamma-gliadins. Wang et al. 
[14] recognised four groups of gamma-gliadins. 

Although wheat storage proteins form multigene 
families, their phylogeny can be established effectively 
using knowledge on the phylogenetic and evolutionary 
relationships among Triticum and Aegilops genomes. 
Zhang et al. [15] and Li et al. [16] studied the HMW 
glutenin subunits, whereas Zhang et al. [17] and 
Wang et al. [18] focused on LMW glutenin subunits. 
From this it appears that, in case of multigene 
families, it may be necessary to infer relationships at 
the level of groups of closely related genes rather 
than for individual genes. 

Here we have studied the evolution of gamma-gliadins. 
For this we have complemented the available gamma- 
gliadin sequences from diploid Aegilops I Triticum species 
with novel sequences from diploid species representing 
the other main genome types in Aegilops/Triticum: the 
C, M, N, U, and T genomes. Our analysis of these genes 
shows that there are six groups of gamma-gliadins that 
occur in different combinations across all the genomes. 
We present a model for gene duplications and losses 
that is consistent with our data. Our model indicates 
that at least some gene duplications are presumed to 



predate the most recent common ancestor (MRCA) of 
all Aegilops/Triticum genomes. 

Methods 

Plant material 

In this paper we followed the classification of Van Slageren 
[19] with the exception of Ae. mutica, that was regarded by 
Van Slageren as a separate genus, Ambylopyrum (Jaub. & 
Spach) Eig. We used accessions of 7 diploid Aegilops 
species: Aegilops caudata L. (k-2255, Turkey, C genome), 
Aegilops tauschii Coss. (k-1368, Uzbekistan, D), Aegilops 
comosa Sm. in Sibth. & Sm. (k-2272, Asia Minor, M), 
Aegilops uniaristata Vis. (k-650, Greece, N), Aegilops 
speltoides Tausch (CGN10682 and CGN10684, S), Aegilops 
mutica Boiss. (k-1581, Turkey, T), and Aegilops 
umbellulata Zhuk. (k-1588, Afghanistan, U), as well as Tri- 
ticum monococcum L. (CGN10542, A). The accessions 
starting with "k" were obtained from the All-Russian 
Institute of Plant Industry (St. Petersburg, Russia). 
CGN numbers are from the Centre for Genetic 
Resources (Wageningen, The Netherlands). The set of 
species represent all main genome types in Aegilops/ 
Triticum. Three of the species analysed have genomes 
closely related to genomes of cultivated wheat T 
durum (AB genome) and T aestivum (ABD genome): 
Ae. speltoides, Ae. tauschii, and T. monococcum. 

Cloning and sequencing 

DNA was isolated from young fresh leaves using the 
Edwards procedure modified by Dorokhov and Klocke 
[20,21]. The primers used for amplification of gamma- 
gliadin sequences were complementary to 3' and 5' con- 
served regions of gamma-gliadins. The forward primer 
ylF: 5 , -atgaagaccttactcatcc-3 , resides in the signal pep- 
tide, the reverse primer yllR: 5'-ggacaWagacRttgcacatg- 
3' in domain V. The PCR cycling conditions: 5 min. at 
94°C followed by 24 cycles (94°C for 1 min., 53°C for 1 
min, 72°C for 2 min), 72°C for 10 min, in 25 ul reaction 
volume. The PCR products were cloned into the pCRII- 
TOPO vector (Invitrogen) and sequenced using the M13 
forward (S'-cgccagggttttcccagtcacgac-S') and reverse pri- 
mer (5 , -agcggataacaatttcacacagga-3 , ) and two additional 
internal primers yFi2: 5 , -ccc(ac)tgcaagaat(at)t(ct)c-3' and 
yRi2: 5 , -g(ag)a(at)attcttgca(gt)ggg-3\ This produced four 
overlapping reads for each clone. 

Sequence analysis 

The reads were merged per clone and the sequence data 
were manually checked using SeqMan (DNASTAR) to 
exclude sequencing mistakes. Sequences that were sus- 
pected to be chimeric, that lacked 5' or 3' ends, or that 
had a very long deletion (sequence length in the align- 
ment less than 600 bp) were excluded from the phylo- 
genetic analysis. Each PCR product was a mixture of 
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sequences from different genes, so many of the 11-81 
clones obtained from one PCR reaction were independ- 
ent. However, some duplicate clones may be derived 
from the same gene, possibly even from the same ampli- 
fication product with a particular PCR error. Therefore 
all remaining 335 sequences were conservatively orga- 
nized into 59 contigs (sets of overlapping DNA 
sequences) with 99% similarity. The consensus 
sequences of the contigs thus obtained were used for 
further statistic/phylogenetic analysis. One to three 
sequences representing each consensus sequence were 
submitted to Genbank. In total 69 novel gamma-gliadin 
sequences were submitted, representing 59 contig con- 
sensus sequences. The length of the partial gamma- 
gliadin sequences obtained varied from 545 to 986 base 
pairs and corresponded to a part of full-length open 
reading frame region of gamma-gliadins which is 648- 
1089 bp in length. They encode gamma-gliadins of 215- 
362 amino acids. These sequences are probably not the 
complete set of gamma-gliadin genes from each of the 
accessions, but the aim was to clone a sufficient number 
of genes from each accession to obtain representatives of 



all distinct groups of gamma-gliadins for a phylogenetic 
analysis, rather than a complete set of gamma-gliadin 
genes and pseudogenes from all accessions. 

For the phylogenetic analysis the genes cloned and 
sequenced here were supplemented with sequences of 
diploid Triticum and Aegilops species and of the related 
genera Lophopyrum, Crithiopsis, and Dasypyrum from 
EMBL/Genbank (as present in August 2011). These 
were organized in the same way in contigs of 99% se- 
quence similarity; a total of 145 sequences and 68 con- 
tigs (Table 1). All 127 contigs (59 composed of novel 
sequences and 68 of EMBL/Genbank-derived sequences) 
were trimmed to represent the same part of the gene. 
One gamma-hordein sequence (AY338365 from Hor- 
deum chilense) and three secalins (EU368041 from 
Secale cereale, EF432546 from Secale sylvestre, and 
HQ266670 from Secale strictum) were included as out- 
groups, as the sequence alignment already indicated that 
they are more distant. 

Both the nucleotide and the deduced amino acid 
sequences of the gamma-gliadin dataset were aligned 
using MEGA4 [22], and Maximum-Likelihood (ML) 



Table 1 The 127 gamma-gliadin sequences analysed in this study 



Species 


Accession 


Genome 


N seq 


N contigs (99%) 


Genes 


Pseudo-genes 


Length (bp) 


Genbank accession nr 


Cloned in this study 


















T. monococcum 


CGN 10542 


Am 


23 


5 


4 


1 


759-939 


JQ269804-JQ269808 


Ae. coudoto 


K2255 


C 


50 


12 


9 


3 


909-948 


JQ269703-JQ269716 


Ae. comoso 


K2272 


M 


32 


5 


3 


2 


887-909 


JQ269717-JQ269721 


Ae. uniaristata 


K650 


N 


35 


7 


6 


1 


873-924 


JQ269742-JQ269750 


Ae. mutico 


K1581 


T 


35 


8 


7 


1 


882-909 


JQ269722-JQ269729 


Ae. touschii 


K1368 


D 


32 


4 


4 


0 


879-897 


JQ269789-JQ269792 


Ae. umbellulata 


K1588 


U 


36 


10 


5 


5 


873-928 


JQ269730-JQ269741 


Ae. speltoides 


CGN 10682 


S 


11 


3 


3 


0 


648, 909 


JQ269774-JQ269778 


Ae. speltoides 


CGN 10684 


S 


81 


5 


5 


0 


873-1089 


JQ269751-JQ269757 


Total cloned in this study 






335 


59 


46 


13 






Already present in Genbank/EMBL/DDBJ (in August 2011) 












Ae. searsii 






9 


3 


3 


0 






Ae. bicornis 






13 


3 


3 


0 






Ae. longissimo 






10 


4 


3 


1 






Ae. shoronensis 






8 


5 


4 


1 






Ae. speltoides 






11 


3 


3 


0 






Ae. touschii 






10 


4 


4 


0 






T. monococcum 






30 


14 


13 


1 






T. urartu 






14 


5 


4 


1 






Crithopsis delileono 






2 


2 


2 


0 






Lophopyrum elongotum 






16 


14 


8 


6 






Dasypyrum sp. 






22 


11 


5 


6 






Total in Genbank/EMBL/DDBJ 






145 


68 


52 


16 
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analysis was performed with PhyML 3.0 (http://www 
phylogeny.fr [23,24]) using the GTR-substitution model 
for nucleotide data and WAG-model for amino acid 
data. SH-like approximate likelihood-ratio test was used 
for estimation of branch support [25]. MEGA4 used the 
complete alignment, while the ML-program at PhyML 
excluded all sites with deletions. When we used the pair- 
wise deletion option for neighbour joining (NJ) in 
MEGA4 we obtained the same tree topology. 

The number of base differences per site, number of syn- 
onymous differences per synonymous site and number of 
non-synonymous differences per non-synonymous site 
from averaging over all sequence pairs within each group 
and overall sequences was calculated using the method of 
Nei and Gojobori [26] with incorporation of the Jukes- 
Cantor correction in MEGA4. Standard error estimates 
were obtained by a bootstrap procedure (1000 replicates). 
All positions containing alignment gaps and missing data 
were eliminated only in pairwise sequence comparisons 
(Pairwise deletion option). The ratio between synonymous 
substitutions per site (d s ) and non-synonymous substitu- 
tions per site (d N ) and (d s Id N ratio) was calculated. 

To study the selection pressure on gamma-gliadin 
sequences the codon-based test for selection (Z-test) 
was performed for sequences of each of groups and for 
overall dataset. The variance was computed using boot- 
strapping (1000 replicates). To analyse differences in se- 
lection pressure on full open reading frame (ORF) and 
pseudogene gamma-gliadin sequences the number of 
synonymous (Ks) and non-synonymous substitutions 
(Ka) per site were calculated from pairwise comparisons 
for ORF and pseudogene sequence pairs using the 
method of Nei and Gojobori [26]. The values obtained 
were used for a scatter plot in Excel. 

Results 

Gamma-gliadin sequences 

In order to analyse genetic diversity and the evolution of 
the gamma-gliadin multigene family 335 gamma-gliadin 
sequences were cloned and sequenced from species repre- 
senting all main genome types in Aegilops/Triticum (A, B/ 
S, D, G, M, N, U, and T genomes). The aim was to clone 
and sequence a sufficient number of genes from each ac- 
cession to obtain representatives of all distinct groups. The 
sequences were assembled into contigs at 99% homology at 



nucleotide level (Additional file 1). The contigs with intact 
open reading frames represented 46 different predicted 
gamma-gliadin proteins (Table 1). Thirteen contigs (49 
sequences) contained internal stop-codon or frame- 
shift mutations and were therefore considered to repre- 
sent pseudogenes. The fraction pseudogene sequences 
differed among the eight Aegilops/Triticum species ana- 
lysed. For example, more than half of all sequences of Ae. 
umbellulata were pseudogenes (20 of 35 sequences in 5 of 
10 contigs), while no pseudogene contigs were present 
among 32 sequences from Ae. tauschii (Table 1). 

Figure 1 presents a schematic overview of the structure 
of gamma-gliadins, after [9] and [27]. The sequences of 
the predicted intact proteins varied in length considerably 
due to variation in the length of the repetitive domain (II) 
and the length of the glutamin-rich domain (IV). Most of 
the sequence length variation was observed among Ae. 
speltoides sequences, and both the shortest and the long- 
est sequences were isolated from Ae. speltoides. 

Clustering and phylogenetic analysis 

An analysis of the sequences with a gamma-hordein as 
outgroup, resulted in a multiple sequence alignment 
(Additional file 2 contains the nucleotide alignment, 
Additional file 3 contains the amino acid alignment, both 
in Nexus format). The maximum-likelihood (ML) tree 
produced on the basis of the alignment contained a separ- 
ate cluster of secalins and two well-supported groups of 
gliadins of unequal size: 53 consensus sequences belonged 
to the first group and 74 belonged to the second group 
(Additional file 4 contains the tree based on nucleotide 
sequences, Figure 2 shows the tree based on deduced 
amino acid sequences). In total six significant (bootstrap 
support value 84% or higher) groups were observed, two 
within the first branch (designated group 1 and 2) and 
four within the second branch (designated group 3-6). 
The groups contain sequences cloned here as well as 
sequences obtained from Genbank, and Genbank 
sequences do not form additional groups, indicating that 
we have cloned and sequenced sufficiently deep. 

Sequences of Ae. umbellulata (U), Ae. comosa (M), 
Ae. mutica (T), Ae. tauschii (D), all species with an S 
genome {Ae. speltoides (S), Ae. searsii (S s ), Ae. bicornis 
(S b ), Ae. sharonensis (S sh ) and Ae. longissima (S 1 )) 
occurred in both branches and in at least two unrelated 





1 II 




r4 


IV 


v 












■ 







Figure 1 Schematic overview of the structure of gamma-gliadins. The proteins consist of a short N-terminal signal peptide (S) followed by a 

unique N-terminal domain (I) and a repetitive domain (II). Domain III contains most (often 6) of the cysteines. IV is rich in glutamine. Two 

conserved cysteines are in V. Eight cysteine residues (indicated with vertical lines) can form four interchain disulfide bonds (indicated as 

connections between lines). Figure after [9]. 
\ ) 
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Group 1 
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Group 5 
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r Aml4_312 
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-Am7_285 
Aul_285 
— Au4_250 

Am9_285 

Au3_281 
Aml9_287 
Am8_290 
Aml_290 
Am4_252 
.Am3_289 
Am2_290 
lAml3_290 
r AmlO_262 
Amll_287 

J Aml7_287 

I— Aml2_302 
j Secale_strictum_HQ266670 
JlSecale_cereale_EU368041 
1- Secale_sylvestre_EF432546 
Hordeum_chilense_AY338365 



Group 6 



Group 4 



Figure 2 Maximum-Likelihood (ML) tree of the gamma-gliadins 
(based on amino acid sequences). A maximum-likelihood (ML) 
analysis was performed with PhyML 3.0. SH-like approximate 
likelihood-ratio test was used for estimation of branch support. 
Proteins with a length in the alignment less than 200 amino acids 
were excluded from the analysis. The gamma-gliadins fall into six 
groups (1-6 on the right) in two branches (1-2 and 3-4-5-6). Key for 
the sequence codes in Additional file 1. 



groups (Figure 3). Sequences originating from Triticum 
species with an A genome (T. monococcum (A m ) and T. 
urartu (A u ), and Aegilops species Ae. caudata (C) and 
Ae. uniaristata (N) were restricted to the second branch. 
Within this second branch, all gamma-gliadin sequences 
from T. monococcum (A m ) and T. urartu (A u ) clustered 
in group 4. Group 3 consisted only of Ae. caudata (C) 
sequences, and it included all of them except one that 
was present in group 6. All groups except the Ae. caud- 
ata-specific group 3 included a mixture of sequences of 
three to seven species of Aegilops/Triticum. Each of the 
groups included terminal branches that are mainly spe- 
cies/genome-specific. 

The gliadin sequences of Dasypyrum, Lophopyrum and 
Crithopsis included in the analysis were also positioned 
within the two branches despite the fact that Triticum 
and Aegilops are much more closely related and treated 
as one large genus by some authors [28,29]. The 
sequences of Lophopyrum clustered in groups 2 and 6, 
sequences of Dasypyrum clustered in groups 1 and 4 (in 
group 4 only pseudogenes, visible in the nucleotide max- 
imum likelihood (ML) tree in Additional file 4), and 
those from Crithopsis clustered in group 1. Only 
groups 3 and 5 contained exclusively sequences of 
Aegilops/Triticum species. 

Genetic variation within and among the groups 

The most polymorphic sequences were found in group 
1. This group of sequences varied in length from 762 to 
1089 bp, which means that it includes many of the 
shortest and all of the longest variants of the whole 
study. They were highly polymorphic with a codon- 
based evolutionary divergence (d) of 0.089 ± 0.005 
(ds=0.191, dn=0.065) (Table 2, Additional file 5). Genes 
of this group are only maintained in the D and various S 
genomes and in the genera Lophopyrum, Crithiopsis, 
Dasypyrum. They occur as pseudogenes in the U and M 
genome (Figure 3). It thus appears that group 1 has 
undergone intensive diversification and death processes 
in most of the species analysed. 

The least polymorphic are the group 6 gamma-glia- 
dins. They are present in seven Aegilops genome types 
(T, D, U, C, N, M and S (only Ae. speltoides)) and in 
Lophopyrum. The Aegilops sequences of this group all 
have the same deduced ORF length of 909 bp, coding 
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Figure 3 Occurrence and absence of genes from different ancestral groups across the Aegilops/Triticum genomes. Overview of the 
occurrence of genes from the six groups recognised from the maximum likelihood trees (Figure 2, Additional file 4), represented here as Gr1-Gr6 
to the left, in all taxa for which we have gamma-gliadin sequences, sorted by genome (at the top). Color in a cell means present, empty means 
absent. Text in cells indicates additional features: 'pseudo' means that all sequences represent pseudogenes (i.e., with stopcodons); '9 cys' 
indicates that all genes contain exactly 9 cysteines (all other gamma-gliadins generally contain 8 cysteines). 



for a 302 amino acids gliadin protein. The average 
codon-based evolutionary divergence over sequence 
pairs within this group (d) is 0.041 ± 0.004 (ds=0.087, 
dn=0.029), which is only half of the group 1 gliadins. 
Interestingly, all Aegilops sequences of group 6 have an 
additional cysteine residue whereas in Lophopyrum 
sequences of group 6 the additional cysteine is not 
present, and here the predicted length of the protein is 
not 302 amino acids either. The cysteine can easily be 
formed by a single nucleotide change (TCC to TGC). 



The Aegilops species that do not have group 6 gliadins 
are the S genome species except Ae. speltoides (S s , S b , 
S sh , S 1 genomes), all of which have group 5 gliadins 
(Figure 3). These gliadins, although distinct in sequence 
composition, have the same length of 302 amino acids 
as the group 6 gliadins and have also an additional cyst- 
eine in the same position (except FJ006687, which has a 
large deletion). As a consequence, each Aegilops species 
contains a group of 9-cysteine gliadins, either from 
group 6 or from group 5. The U and N genomes contain 



Table 2 Estimates of average evolutionary divergence over sequence pairs within groups 

Pairwise Deletion 



Synonymous mutations only Non-synonymous mutations only All substitutions 



Groups 


dS 


S.E. 


dN 


S.E. 


d 


S.E. 


dN/dS 


Gr 1 


0.191 


0.019 


0.065 


0.006 


0.089 


0.005 


0.340 


Gr 2 


0.092 


0.014 


0.03 


0.004 


0.042 


0.004 


0.326 


Gr 3 


0.072 


0.012 


0.034 


0.005 


0.044 


0.005 


0.472 


Gr4 


0.177 


0.019 


0.048 


0.006 


0.074 


0.005 


0.271 


Gr 5 


0.111 


0.016 


0.043 


0.005 


0.059 


0.004 


0.387 


Gr 6 


0.087 


0.012 


0.029 


0.004 


0.041 


0.004 


0.333 


Gr 1 2 


0.198 


0.019 


0.066 


0.005 


0.09 


0.006 


0.333 


Gr 3 4 5 6 


0.187 


0.017 


0.065 


0.006 


0.09 


0.005 


0.348 


overall 


0.249 


0.019 


0.085 


0.07 


0.115 


0.006 


0.341 
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group 6 sequences and group 5 sequences but, in con- 
trast to group 5 sequences from S -genome Aegilops spe- 
cies, the U and N sequences from group 5 all contain 
only eight cysteins and are variable in length. 

Selection 

The codon-based test for selection (Z-test) showed evi- 
dence for purifying selection in each of the six groups of 
sequences and also overall (Table 2). The ratio between 
synonymous and non-synonymous substitutions per site 
(d s /d N ) for pairwise comparisons of sequences showed a 
relative excess of synonymous substitutions compared to 
non-synonymous substitutions in full open reading 
frame genes compared to genes with stop codons (pseu- 
dogenes) (see the trend line in Additional file 5). The 
difference in the ratios is comparable to those obtained 
for intact and pseudogene alpha-gliadins [4] but some of 
the values for dS as well as dN are higher, indicating that 
gamma-gliadins are an evolutionary older family. 

Discussion 

The main genomes within the Aegilops/Triticum group 
(A, S/B, C, D, M, N, T, U) have split within an evolution- 
ary short period, 2.5 to 4.5 MYA [30]. Multi-gene fam- 
ilies have expanded in the same period as these genomes 
split. Here we obtained 59 new gamma-gliadin genes 
from eight genomes, and have analysed these data to- 
gether with gene sequences in Genbank in the frame of 
gains and losses of groups of gamma-gliadin genes 
during the evolution of these species. This has produced 
new insight in how this multigene family has developed. 
Among the diversity of genes some groups show a re- 
markable stability of protein length and number of 
cysteines, suggesting functional relevance. 

A model for the evolution of gamma-gliadins 

Evolution of multigene families occurs by duplication of 
gene clusters [31,32]. Gao et al. [33] showed evidence for 
multiple rounds of segmental duplication of omega-gliadin 
genes in wheat. The evolution of the gamma-gliadins 
appears to fit to the birth-and-death evolutionary model 
[34]. The sequence data obtained here allowed us to distin- 
guish six groups of closely related gamma-gliadins 
(Figures 2 and 3, Additional file 4), which appear to be 
organised in two branches. These two ancestral branches 
predate the MRCA of the Aegilops/Triticum clade, as they 
also include sequences from the genera Lophopyrum, 
Crithopsis, and Dasypyrum. A hordein sequence from 
Hordeum and the secalins from Secale clustered outside 
the two main branches. A recent phylogenetic study of the 
Triticeae based on one chloroplastic and 26 nuclear gene 
sequences [35] placed Secale closer to Aegilops and 
Triticum than Dasypyrum, but also noted that the clade 
grouping these genera had evolved in a reticulated manner, 



and that their relationships are better represented by a 
multigenic network. 

Based on a careful examination of the presence and 
absence of the six groups of gamma-gliadins we present 
a model for the evolution of this multigene family during 
the evolution of the Aegilops/Triticum (Figure 4). Note 
that in this model the order of the groups along the 
chromosome is arbitrary, and that repetitive DNA and 
non-gamma-gliadin genes that are present between 
gamma-gliadins [33] have been omitted. While develop- 
ing this model we have assumed that our set of 
sequences (both cloned here and obtained from 
Genbank) is sufficiently deep to not have missed par- 
ticular groups. Evidence supporting this notion is that 
(i) our sequences, obtained using PCR primers 
designed by us, fall into the same six groups as those 
of other diploid taxa from Genbank; (ii) all groups 
except the Ae. caudata-specific group 3 included a 
mixture of sequences of three to seven species of 
Aegilops/Triticum; (iii) the number of genes from one 
genome was not correlated with the number of 
groups into which they clustered. All Ae. caudata 
genes but one ended up in group 3, but we had 
cloned 12 different genes. T. monococcum genes 
ended up only in the lower branch, but we had as 
many as 19 different genes (Table 1). Finally, (iv) four 
of these groups were also recognised by other studies. 
One of the two groups missed by Wang et al. [14] 
was the Ae. caudata-specific group 3. 

Gamma-gliadin duplication, pseudogenisation, and loss 
during Aegilops/Triticum genome evolution 

The six groups of gamma-gliadins fall into two branches: 
one including group 1 and group 2 genes, and one in- 
cluding groups 3 to 6. In our evolutionary model the 
MRCA of the Aegilops I Triticum spp. already has four 
distinct groups of differentiated gamma-gliadin 
sequences, i.e., two from each branch (group 1, 2, 4 and 
6, Figure 4). Almost all extant Aegilops I Triticum gen- 
omes include several distinct groups of gamma-gliadins. 
The only exception is the A genome of Triticum, which 
contains only group 4 gliadins. Consequently, its pos- 
ition in the model is the least supported, as loss of the 
other groups may have occurred at several points in 
time. The T genome lost group 4 and group 1 gliadins. 
A major split is between the D genome and the S 
genomes, that have lost the group 4 gliadins but main- 
tained group 1 plus group 2 gliadins, and the genomes that 
lost group 1 and group 2 gliadins (M, N, U, C genomes). It 
is likely that these lineages have split from the MRCA of 
the other Aegilops genomes very early. This is consistent 
with taxonomic studies. T. monococcum and T. urartu, 
carrying two different modifications of the A genome, are 
usually treated together with polyploids carrying the A 
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Ae. tauschii 
D genome 



MRCA S-genome 
Aegilops species 



MRCA of Aegilops with 
C, U, M, N genomes 



V 



Ae. speltoides 
S-genome 



MRCA of Aegilops with 
S s , S b ,S', S sh -genomes 

ID " 



MRCA of Aegilops with ^ 
C and U genomes 



MRCA of Aegilops with ^ 
M and N genomes 




S3 


- pseudogenisation 


X 


- deletion 


• 


- group 1 


• 


- group 2 


□ 


- group 4 


□ 


- group 6 


■ 


- group 5A (9 cysteins, 806 bp length) 


■ 


- group 5B (8 cysteins, variable length) 


□ 


- group 3 



Ae. uniaristata 
N genome 



Ae. comosa 
M genome 



Figure 4 Model for the evolution of groups of gamma-gliadins in Aegilops/Triticum. The six groups proposed are based on the ML tree 
(Figure 2, Additional file 4) and occur in genomes as summarised in Figure 3. Note that in this model the order of the groups on the 
chromosome is arbitrary, and duplications of genes within each group are ignored. The occurrence of pseudogenes is only indicated when it 
affected complete groups, but some pseudogenes may occur in all groups. Note that each genome has either group 6 gliadins or group 5 
gliadins with nine cysteines and constant length. 
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genome as a separate genus, Triticum [19,36-40]. Ae. 
mutica (T) appears to represent a separate evolutionary 
line within Aegilops/Triticum as this species shows many 
primitive characters. In some classifications it is treated as 
a separate genus, Ambylopyrum [19,41], or placed within a 
separate monotypic subgenus, Ambylopyrum, within 
Aegilops [39]. Cytogenetic studies [42] confirmed this 
isolated position. The D genome of Ae. tauschii was 
already regarded by early cytogenetic studies as a rather 
well-separated lineage [43]. Some DNA marker-based 
studies placed it at basal position in the Aegilops/Triticum 
group [44-46]. 

According to our model, the most recent ancestor 
(MRCA) of the S genomes probably gained the group 5 
gliadins. Ae. searsii (S s ), Ae. bicornis (S b \ Ae. sharonensis 
(S sh ) and Ae. longissima (S l ) all have sequences of group 
5 but none of group 6. Ae. speltoides (S) has group 6 
sequences but none of group 5, in correspondence with 
it being the most divergent of the species of section 
Sitopsis [46-53]. Note that Eig [37] put Ae. speltoides in 
a separate subsection, Truncata, on the basis of morpho- 
logical evidence. As the S genome species together are 
well separated from all other Aegilops species, they were 
by some considered as more closely related to Triticum 
than to other Aegilops species [54,55]. 

The species Ae. caudata (C), Ae. umbellulata (U), Ae. 
comosa (M) and Ae. uniaristata (N) share a common 
node in our model, representing a hypothetical common 
ancester that was differentiated from all other genomes 
by the combination of pseudogenes in group 1 gamma- 
gliadins and the absence of group 2 gamma-gliadins. 
From this ancestor the N and M genomes maintained 
group 4 gliadins, while the C and U genomes lost them. 
The similarity of Ae. caudata to Ae. umbellulata and Ae. 
comosa to Ae. uniaristata was already proposed by 
Kihara [43] and Lucas and Jahier [56] based on cytogen- 
etic analysis, and by Dvorak and Zhang [48] based on 
RFLP data. A recent phylogenetic analysis of chloroplast 
haplotypes also showed similarity between the genomes 
of Ae. comosa, Ae. uniaristata and Ae. caudata [57]. 

Evolution and selection of gamma-gliadins 

A high level of genetic diversity was observed among 
gamma-gliadins, similarly to results of [3,10] and [14]. 
The number of groups in each genome reflects a more 
complicated evolution, over a longer period of time, than 
e.g. the alpha-gliadins of locus Gli-2 on chromosome 6, 
which have been suggested to originate from a gliadin 
locus on chromosome 1 through a translocation event 
[5]. At the same time they do contain fewer pseudogenes 
that the 90% of alpha-gliadins [4]. The codon-based test 
for selection (Z-test) showed evidence for purifying se- 
lection in all groups of gamma-gliadin sequences 
(Table 2, Additional file 5) and at higher levels in intact 



genes than in pseudogenes. What mechanism made the 
gamma-gliadins split into separate groups, why is 
purifying selection stronger, and why do they have 
relatively few pseudogenes? One clue may come from 
the fact that the strength of selection, the variation in 
sequence length and in the number of cysteines, and 
the percentage pseudogenes, are clearly different between 
the six groups (Figure 3). This is most readily under- 
stood by comparing the most conserved and most 
polymorphic groups. 

The most polymorphic is group 1, in which the genes 
encode proteins with 8 cysteines, which would allow 
them to be present as monomers. Deduced full 
sequences of this group varied in length from 762 (an 
Ae. searsii sequence from Genbank) to 1089 bp, which 
means that this group contains some of the shortest and 
all of the longest variants of the whole study. They were 
also most polymorphic in terms of sequence divergence, 
and the group is lost in many lineages (only maintained 
in Lophopyrum, Crithiopsis, Dasypyrum, and D and vari- 
ous S genomes) or consists of pseudogenes only (U and 
M genome). This suggests that as far as group 1 proteins 
perform any biological function, they are interchangeable 
with gliadins from other groups. 

The most conserved are the group 6 gamma-gliadins, 
present in almost all Aegilops genome types (T, D, U, C, 
N, M and S (only Ae. speltoides)) and in Lophopyrum. 
They all have an uneven number of nine cysteines. The 
uneven number of cysteines would allow these proteins 
to become linked to a gluten network and function as a 
chain terminator. This particular group of gliadins is 
very conserved in length (all are 302 amino acids), ex- 
cept in Lophopyrum, where the additional cysteine is not 
present. The Aegilops species that do not have group 6 
gamma-gliadins are the S genome species (except Ae. 
speltoides), all of which have group 5 gamma-gliadins, 
which are distinct in sequence composition but have the 
same length as the group 6 gliadins and have an add- 
itional cysteine in the same position. As a result, each 
Aegilops species has a group of 9-cysteine gamma-glia- 
dins of a specific and conserved length. This strongly 
suggests that these 302 amino acid, 9-cysteine gamma- 
gliadins perform a specific function, possibly in relation 
to the gluten network formation during protein body 
formation in developing wheat grains. The traditional 
idea that gamma-gliadins have no free cysteines, and 
that all four S-S linkages (corresponding to 8 cysteines) 
are intramolecular, thus preventing gliadins from partici- 
pating in the polymeric structure of glutenin, is clearly 
too simple. Altenbach et al. [58] already found several of 
these odd-numbered gamma-gliadins, but not yet in all 
genomes. The cysteines may be functional in combin- 
ation with a fixed length if that provides a particular sec- 
ondary structure (beta-reverse turns [59], possibly also 
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Triticeae. A maximum-likelihood (ML) analysis was performed with 
PhyML 3.0 using the GTR-substitution model. SH-like approximate 
likelihood-ratio test was used for estimation of branch support. 
Sequences that had length in the alignment less than 600 bp were 
excluded from the analysis. The gamma-gliadins fall into six groups 
(1-6 on the right) in two branches (1-2 and 3-4-5-6). Key for the 
sequence codes in Additional file 1. 

Additional file 5: Ks/Ka ratio of intact and pseudogene gamma- 
gliadins. Scatter plot of the numbers of synonymous substitutions (Ks) 
and non-synonymous substitutions (Ka) per site for pairwise comparisons 
among full open reading frame gamma-gliadins and pseudogene 
sequences. Linear trendlines with the intercept set to zero are shown 
both for full-open reading frame (ORF) sequences and pseudogene 
sequences. 



related their capability to function as chain terminators 
in the polymer network). 

Upelniek et al. [60] showed that differences in gliadin 
allele composition of Gli-1 loci among bread wheat var- 
ieties were correlated with differences in proteolysis 
rates during germination. Nevertheless, and apparently 
in contrast to the notion of specific functionality of at 
least some gamma-gliadins, hexaploid wheat appears to 
tolerate the loss of most or all gamma-gliadin proteins, 
as spring wheat cultivar Bobwhite grains remained viable 
when gamma-gliadin gene expression was mostly elimi- 
nated with RNAi [61] or when the bulk of all gliadins 
was silenced using an RNAi construct based on a con- 
served region from alpha-, gamma- and omega-gliadins 
[62]. However, Gil-Humanes et al. [63] did observe ir- 
regularities in the development of protein bodies in the 
endosperm when all gliadins were down-regulated, not 
only the gamma-gliadins. The effect of a reduction of 
gamma-gliadins by RNAi in commercial cultivars [64,65] 
or as a result of deletions in 'Chinese Spring' [66] is an 
increase in dough strength, which is consistent with a 
chain termination activity of part of the gamma-gliadins. 

Conclusion 

We have studied the evolution of gamma-gliadins in dip- 
loid species of Aegilops/Triticum representing all main 
genome types in the group. Wide sampling enabled us 
to show that gamma-gliadins are represented by six 
diverged groups of genes that occur in different combi- 
nations across the genomes. The current gamma-gliadin 
composition in each of the genomes is the result of mul- 
tiple gene duplication and divergence events followed by 
pseudogenisation within groups as well as loss of groups 
of genes during genome evolution. We have presented a 
possible model for duplications and deletions of groups 
of genes that proposes that at least some duplications 
predate the most recent common ancestor of all Aegi- 
lops/Triticum genomes that currently exist. Although 
the length and repeat composition are variable among 
genes, one specific type, a nine cysteine- containing 
gamma-gliadin of 302 amino acids, occurs in all Aegilops 
genomes, and these proteins may have a function in pro- 
tein network formation. 
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